Sequence variants in COL4A1 and COL4A2 genes in Ecuadorian families with keratoconus.

Purpose Keratoconus (KTCN) is a non-inflammatory, usually bilateral disorder of the eye which results in the conical shape and the progressive thinning of the cornea. Several studies have suggested that genetic factors play a role in the etiology of the disease. Several loci were previously described as possible candidate regions for familial KTCN; however, no causative mutations in any genes have been identified for any of these loci. The purpose of this study was to evaluate role of the collagen genes collagen type IV, alpha-1 (COL4A1) and collagen type IV, alpha-2 (COL4A2) in KTCN in Ecuadorian families. Methods COL4A1 and COL4A2 in 15 Ecuadorian KTCN families were examined with polymerase chain reaction amplification, and direct sequencing of all exons, promoter and intron-exon junctions was performed. Results Screening of COL4A1 and COL4A2 revealed numerous alterations in coding and non-coding regions of both genes. We detected three missense substitutions in COL4A1: c.19G>C (Val7Leu), c.1663A>C (Thr555Pro), and c.4002A>C (Gln1334His). Five non-synonymous variants were identified in COL4A2: c.574G>T (Val192Phe), c.1550G>A (Arg517Lys), c.2048G>C (Gly683Ala), c.2102A>G (Lys701Arg), and c.2152C>T (Pro718Ser). None of the identified sequence variants completely segregated with the affected phenotype. The Gln1334His variant was possibly damaging to protein function and structure. Conclusions This is the first mutation screening of COL4A1 and COL4A2 genes in families with KTCN and linkage to a locus close to these genes. Analysis of COL4A1 and COL4A2 revealed no mutations indicating that other genes are involved in KTCN causation in Ecuadorian families.

Keratoconus (KTCN, OMIM 148300) is a noninflammatory, usually bilateral disorder of the eye, characterized by progressive thinning and protrusion of the central cornea which results in altered refractive powers and loss of visual acuity [1]. The prevalence of the disease is estimated to be 1 in 2,000 individuals, and is the most common ectatic disorder of the cornea [1]. KTCN afflicts males and females in all ethnic groups [1]. Signs and symptoms depend on the stage of disease, with the first signs usually appearing in the third decade of life [1,2]. The cause of KTCN is still unknown; both genetic and environmental factors seem to play a role in its etiology. Although most cases of KTCN are isolated, an association with many syndromes, such as Down syndrome [3], Ehlers-Danlos syndrome [4], and Leber congenital amaurosis [5] has been described. Furthermore, extensive studies have shown an association between KTCN and constant eye rubbing [6], contact lens wear [7], or atopy [8]. Usually, KTCN is a sporadic disorder, but positive family history has been observed in 6%-8% of cases [1]. An autosomal dominant inheritance pattern with reduced penetrance has been suggested in 90% of patients with familial KTCN [9,10].
We have demonstrated an evidence of linkage to a novel locus at 13q32 [21]. Collagen type IV, alpha-1 (COL4A1; OMIM 120130) and collagen type IV, alpha-2 (COL4A2; OMIM 120090) are mapped in close proximity to that locus. The COL4A1 and COL4A2 genes are organized in a head-tohead conformation [22]. These gene pairs share a common promoter, and each gene is transcribed in opposite directions [23]. The COL4A1 gene is placed on the minus strand and consists of 52 exons, while the COL4A2 gene is on the opposite strand and consists of 48 exons. They encode two of six collagen type IV chains -α1 and α2 (1,669 and 1,712 amino acids, respectively) -forming a heterotrimeric protein molecule of collagen type IV (α1α1α2), which is found in the Haplotype analysis: PEDSTATS [31] was used to verify the structure of KTCN-014 family and identify potential Mendelian inconsistencies in the inheritance of single nucleotide polymorphisms (SNPs) in COL4A1 and COL4A2. For that region, to determine the full haplotypes inherited along with the substitutions occurring in affected individuals, a reconstruction of observed sequence variants was prepared using SimWalk2 [32,33]. Allele frequencies were set as equal. The location of genetic markers was determined on the basis of the Rutgers combined linkage-physical map of the human genome [34], either directly or by interpolation. Haplotype was generated with HaploPainter [35]. Statistical analysis for Gln1334His substitution: The difference in distribution of Gln1334His substitution between affected and unaffected individuals in family KTCN-014 was analyzed by Fisher's Exact Test for Count Data. Similarly, 25 affected individuals from the remaining KTCN families versus 64 Ecuadorian control individuals were compared using Fisher's Exact Test. The difference between the examined groups was considered significant if the value of probability (p) did not exceed 0.05. Prediction of effect of amino acid substitutions on protein function: The potential impact of amino acid substitutions on the COL4A1 and COL4A2 proteins was examined using PolyPhen, SIFT, PMUT, PANTHER, and SNAP tools.
The PolyPhen tool predicts which missense substitution affects the structure and function of protein, and uses Position-Specific Independent Counts software to assign profile scores. These scores are the likelihood of the occurrence of a given amino acid at a specific position, compared to the likelihood of this amino acid occurring at any position (background frequency) [36].
The SIFT analytic tool, on the basis of gene sequences homology, evaluates conserved positions, and calculates a score for the amino acid change at a particular position. A score of <0.05 is considered as pathogenic and has a phenotypic effect on protein structure [37].
The PMUT calculates the pathological significance of non-synonymous amino acid substitution using neural networks (NN). NN output >0.5 is considered to be deleterious [38]. PANTHER estimates the likelihood of a particular amino acid's change affecting protein function. On the basis of an alignment of evolutionarily related proteins, it generates the substitution Position-Specific Evolutionary Conservation (subPSEC). The subPSEC could achieve values from 0 (neutral) to about −10 (most likely to be deleterious). The value −3 is the cutoff point for functional significance, and corresponds to a Pdeleterious of 0.5. If the substitution occurs at a position not appearing in the multiple sequence alignment, a subPSEC score cannot be calculated and change is not likely to be pathogenic [39,40]. The SNAP tool predicts the functional consequences of exchanging amino acids using evolutionary conservation and structure/function relationships. The SNAP output shows prediction neutral or non-neutral, and the expected accuracy [41].

Forty eight members of 15 Ecuadorian families and 64
Ecuadorian control subjects were included in the study. Twenty-three individuals from family KTCN-014, two affected individuals from each of the families KTCN-011, 015, 019, 020, 021, 024, 025, 030, 031, 034, and 035, and one patient from each of KTCN-05, 013, and 017 were examined. Screening of exon/intron junctions in COL4A1 and COL4A2 revealed numerous sequence variants in the surrounding non-coding sequences, 71 and 86, respectively, including single nucleotide changes, insertions, and deletions. All screening results are summarized in Table 2.
The sequencing of the genomic region containing the common promoter of COL4A1 and COL4A2 revealed no sequence changes.
Statistical analysis and in silico predictions: PolyPhen analyses of non-synonymous changes in COL4A1 and COL4A2 predicted that only the Gln1334His variant in COL4A1 was possibly damaging for protein function and structure ( Table 3). The multiple sequence alignment of COL4A1 orthologs shows that the amino acid glutamine at position 1,334 is conserved throughout the analyzed species ( Figure 1). Gln1334His substitution was observed more frequently in patients than in healthy individuals in family KTCN-014 (p=0.056). There was no difference in the c. 4002A>C allele distribution between the analyzed affected individuals from the remaining KTCN families and the Ecuadorian control subjects (p=0.17).
The SIFT, PMUT, PANTHER, and SNAP analyses defined all missense amino acid substitutions in COL4A1 and COL4A2 as neutral/tolerated and lacking any effect on protein function. All prediction results are summarized in Table 3.
Haplotype reconstruction: Haplotypes of sequence variants observed in family KTCN-014 are shown in Figure  2. The coding sequence variants in COL4A1 are surrounded by markers rs13260 and col4a1_snp2. Exons of COL4A2 are localized between rs35466678 and rs422733.
KTCN-014 consists of two family branches. Distinct haplotypes in the branches were identified ( Figure 2). In the first one, initiated by parents KTCN-93 and KTCN-01, six subjects with KTCN had the same haplotype in the COL4A1 region, extending from rs13260 to col4a1_snp1. Three unaffected individuals, KTCN-13, KTCN-14, and KTCN-22, share that part of the haplotype with their affected relatives. One of four variants in this region, rs3742207, causes a change in the protein sequence, replacing Gln in position 1334 with His (Gln1334His). That haplotype region, from rs13260 to col4a1_snp1, represents a short fragment of the haplotype which covers the whole COL4A1 and COL4A2 sequence in KTCN-03, KTCN-05, KTCN-06, and KTCN-14. In addition, individuals KTCN-07, KTCN-09, KTCN-13, KTCN-22, and KTCN-23 share the rs874203-rs422733 region (Figure 2 pink bars). For markers rs13260-col4a1_snp1, a different haplotype was observed in the second family branch, initiated by parents KTCN-92 and KTCN-16. This haplotype covered the entire length of the analyzed region, and was identified in all affected individuals and KTCN-21, whose phenotype was unknown. Subject KTCN-17 had the same allele pattern for markers s13260-col4a1_snp1, as individuals from the first branch of the family. However, in this case, analysis indicated that these markers are inherited from KTCN-92, who is unrelated to KTCN-93 and KTCN-01.

DISCUSSION
To our knowledge, this is the first report describing complete sequence analysis of the coding regions and the exon-intron  The PolyPhen tool predicts which missense substitution affects the structure and function of protein, and uses Position-Specific Independent Counts software to assign profile scores. The SIFT tool evaluates conserved positions, and calculates a score for the amino acid change at a particular position. A score of <0.05 is considered as pathogenic for the protein structure. The PMUT calculates the pathological significance of non-synonymous amino acid substitution using neural networks (NN). NN output >0.5 is considered to be deleterious. PANTHER generates the substitution Position-Specific Evolutionary Conservation score. The value −3 is cutoff point for functional significance and corresponds to a Pdeleterious of 0.5. If the substitution occurs at a position not appearing in the multiple sequence alignment, a subPSEC score cannot be calculated and change is not likely to be pathogenic. The SNAP output shows prediction neutral or non-neutral, and the expected accuracy.
boundaries of COL4A1 and COL4A2 in families with KTCN. Previous studies have revealed a correlation between KTCN development and histopathological alterations in the structure of the corneal stroma and basement membrane, including a loss of collagen concentration [42] and rearrangement of collagen fibers [26]. Moreover, several types of collagen, including collagen type IV have been identified in the cornea [24], and COL4A1 and COL4A2 expression has been detected in the human cornea [29]. Finally, we had mapped a locus for KTCN to 13q32, in close proximity of which COL4A1 and COL4A2 are localized [21]. Given that information, we hypothesized that COL4A1 and COL4A2 genes are good candidates for causing KTCN in families with linkage to that locus. Different studies have revealed several loci and a few candidate genes for familial KTCN. The first gene proposed as playing a significant role in KTCN pathogenesis was the VSX1 (visual system homeobox 1, OMIM 605020) gene. It was suggested that a few disease-causing mutations were present in this gene [43,44], but recent studies have not confirmed these findings [21,[45][46][47]. Next, heterozygous genomic 7-bp deletion in intron 2 of SOD1 (superoxide dismutase 1; OMIM 147450) was identified in two families with KTCN [48,49]. In contrast, other studies have shown that mutations in this gene are not associated with KTCN pathogenesis [21,47]. Genetic analyses of COL4A3,COL4A4,COL8A1, and COL8A2 genes have revealed no pathogenic mutations in patients with KTCN, indicating that other genetic factors cause the disease [50][51][52].
We identified several single base pair substitutions in the coding regions of COL4A1 and COL4A2, including one novel heterozygous change, c.3693G>A in exon 42 of COL4A1. None of the detected alterations segregated fully with the affected phenotype in the analyzed members of the Ecuadorian KTCN families. Among the identified missense substitutions in COL4A1, one change, c.4002A>C (p. Gln1334His), was observed more frequently in KTCN patients than in healthy individuals in family KTCN-014. However, no significant statistical association of this change with familial disease could be proven (p=0.056), and no difference in the c.4002A>C allele distribution between the analyzed affected individuals from the remaining KTCN families and the Ecuadorian control subjects was discovered (p=0.17). To predict the impact of the substitutions on the structure and function of the protein, we used different tools. All identified missense substitutions in COL4A1 and COL4A2 were predicted by the SIFT, PMUT, PANTHER, and SNAP tools to have no effect, but PolyPhen defined the Gln1334His change in COL4A1 as possibly damaging. Glutamine at this position is highly conserved in different species. Moreover, this change is present in the collagenous domain of the α1(IV) chain with Gly-X-Y repeats, which plays a role in the assembly into a triple-helical structure of the protein [22]. Replacement of the neutral residue (Gln) with the polar amino acid (His) at the Y position is likely to affect the protein structure. Nevertheless, further studies should be performed to determine the functional significance of this substitution.
To the best of our knowledge, no mutations in COL4A1 were associated with corneal disease. The spectrum of COL4A1-related disorders included porencephaly (OMIM 175780) [53][54][55], Hereditary Angiopathy with Nephropathy, Aneurysm and Muscle Cramps (HANAC; OMIM 611773) [56], and brain small vessel disease with hemorrhage (OMIM 607595) [57]. Recent studies have also revealed an association between mutations in exon 29 of COL4A1 and Axenfeld-Rieger anomaly with leukoencephalopathy and stroke [58]. In our study, none of the previously reported COL4A1 mutations were identified. The absence of these changes in patients with KTCN suggests that they are specific to the above-mentioned disorders only, and are not associated with KTCN in the tested families. To date, no mutations responsible for COL4A2related human diseases have been reported.
Besides changes identified in the coding regions of COL4A1 and COL4A2, our study revealed numerous alterations in introns and UTRs of both genes, including single base pair substitutions, deletions, and insertions. Fourteen of these were novel and their clinical significance is not known. Each of the changes was observed in affected and healthy individuals in the tested families. Because important functional elements are located in non-coding regions of genes [59] and intronic alterations can result in a deleterious effect on pre-mRNA splicing [60], identification of these sequence variants could be non-accidental. Further research is needed to delineate the role of these sequence variants.
Recent studies have shown that a mouse with a mutation in a splice acceptor site of Col4a1 has ocular dysgenesis. The  dispersion, cataracts, and corneal opacifications [61]. Splice acceptor sites are highly conserved regions in different species [56]. We detected no alterations in the splice acceptor site in intron 39 of human COL4A1.
Extended genetic studies executed in families with KTCN have shown a high level of genetic heterogeneity [62]. The presence of many putative loci supports the hypothesis that KTCN is an oligogeneic disease in which accumulation of sequence variants at several loci cause a specific KTCN haplotype and may trigger the phenotypic effect. The absence of mutations in COL4A1 and COL4A2 genes indicates that other genes are involved in KTCN pathogenesis in Ecuadorian families.